Storage array controller for solid-state storage devices

ABSTRACT

A storage array controller provides a method and system for autonomously issuing trim commands to one or more solid-state storage devices in a storage array. The storage array controller is separate from any operating system running on a host system and separate from any controller in the solid-state storage device(s). The trim commands allow the solid-state storage device to operate more efficiently.

CROSS-REFERENCE TO RELATED APPLICATIONS

If any definitions, information, etc. from any parent or relatedapplication and used for claim interpretation or other purpose conflictwith this description, then the definitions, information, etc. in thisdescription shall apply.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to US Classification 711/216.The present invention relates to storage array controllers and moreparticularly to storage array controllers for storage arrays thatinclude solid-state storage devices.

2. Description of the Related Art

U.S. Pat. No. 6,480,936 describes a cache control unit for a storageapparatus.

U.S. Pat. No. 7,574,556 and U.S. Pat. No. 7,500,050 describe destagingof writes in a non-volatile cache.

U.S. Pat. No. 7,253,981 describes the re-ordering of writes in a diskcontroller.

U.S. Pat. No. 6,957,302 describes the use of a write stack drive incombination with a normal drive.

U.S. Pat. No. 5,893,164 describes a method of tracking incomplete writesin a disk array.

U.S. Pat. No. 6,219,289 describes a data writing apparatus for a testerto write data to a plurality of electric devices.

U.S. Pat. No. 7,318,118 describes a disk drive controller that completessome writes to flash memory of a hard disk drive for subsequentde-staging to the disk, whereas for other writes the data is writtendirectly to disk.

U.S. Pat. No. 6,427,184 describes a disk controller that detects asequential I/O stream from a host computer.

U.S. Pat. No. 7,216,199 describes a storage controller that continuouslywrites write-requested data to a stripe on a disk without using a writebuffer.

US Publication 2008/0307192 describes storage address re-mapping.

BRIEF SUMMARY OF THE INVENTION

The invention includes improvements to a storage array controller forstorage arrays that include solid-state storage devices. Theimprovements include the ability of a storage array controller toautonomously issue disk trim commands to one or more solid-state storagedevices.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

So that the features of the present invention can be understood, a moredetailed description of the invention, briefly summarized above, may behad by reference to typical embodiments, some of which are illustratedin the accompanying drawings. It is to be noted, however, that theaccompanying drawings illustrate only typical embodiments of thisinvention and are therefore not to be considered limiting of the scopeof the invention, for the invention may admit to other equally effectiveembodiments. The following detailed description makes reference to theaccompanying drawings, which are now briefly described.

FIG. 1: shows a computer system including storage array controller thatissues autonomous disk trim commands.

FIG. 2A shows a computer system with a storage array containing twoSSDs.

FIG. 2B shows a device driver that issues autonomous disk trim commands.

FIG. 2C shows a device driver that is part of a hypervisor and thatissues autonomous disk trim commands.

FIG. 2D shows a hyperdriver that is part of a hypervisor in a WindowsVirtualization architecture and that issues autonomous disk trimcommands.

FIG. 2E shows a hyperdriver that is part of a hypervisor in a WindowsHyper-V architecture and that issues autonomous disk trim commands.

FIG. 2F shows a hyperdriver as part of a VMWare ESX architecture andthat issues autonomous disk trim commands

FIG. 3: shows an example of an implementation of a storage arraycontroller that maintains a map and a freelist.

FIG. 4: shows an example of an implementation of a storage arraycontroller that performs garbage collection and issues autonomous disktrim commands.

FIG. 5 illustrates an example of an implementation of a garbagecollection algorithm.

FIG. 6 shows an example of an implementation of a storage arraycontroller for use with one or more large-capacity SSDs and illustratesthe storage structure.

FIG. 7 shows an example of an implementation of a storage arraycontroller for use with one or more large-capacity SSDs and illustratesthe use of superblocks.

FIG. 8 shows a screenshot of a BIOS Configuration Utility for a storagearray controller.

While the invention is susceptible to various modifications andalternative forms, specific embodiments thereof are shown by way ofexample in the drawings and will herein be described in detail. Itshould be understood, however, that the accompanying drawings anddetailed description are not intended to limit the invention to theparticular form disclosed, but on the contrary, the intention is tocover all modifications, equivalents and alternatives falling within thespirit and scope of the present invention as defined by the accompanyingclaims.

DETAILED DESCRIPTION OF THE INVENTION

In the following detailed description and in the accompanying drawings,specific terminology and images are used to provide a thoroughunderstanding. In some instances, the terminology and images may implyspecific details that are not required to practice all embodiments.Similarly, the embodiments described and illustrated are representativeand should not be construed as precise representations, as there areprospective variations on what is disclosed that will be obvious tosomeone with skill in the art. Thus this disclosure is not limited tothe specific embodiments described and shown but embraces allprospective variations that fall within its scope. For brevity, not allsteps may be detailed, where such details will be known to someone withskill in the art having benefit of this disclosure.

This invention focuses on storage arrays that include solid-statestorage devices. The solid-state storage device will typically be asolid-state disk (SSD) and we will use an SSD in our examples, but thesolid-state storage device does not have to be an SSD. An SSD may forexample, comprise flash devices, but could also comprise other forms ofsolid-state memory components or devices (SRAM, DRAM, MRAM, volatile,non-volatile, etc.), a combination of different types of solid-statememory components, or a combination of solid-state memory with othertypes of storage devices (often called a hybrid disk). Such storagearrays may additionally include hard-disk drives (HD or HDD).

This invention allows a storage array controller to autonomously issue adisk trim command. The disk trim command allows an OS to tell an SSDthat the sectors specified in the disk trim command are no longerrequired and may be deleted. The disk trim command allows an SSD toincrease performance by executing housekeeping functions, such aserasing flash blocks, that the SSD could not otherwise execute withoutthe information in the disk trim command. The algorithms of thisinvention allow a storage array controller to autonomously issue disktrim commands, even though an operating system may not support the trimcommand. The storage array controller is logically located between thehost system and one or more SSDs. An SSD contains its own SSDcontroller, but a storage array controller may have more resources thanan SSD controller. This invention allows a storage array controller touse resources, such as larger memory size, non-volatile memory, etc. aswell as unique information (because a storage array controller is higherthan the SSD controller in the storage array hierarchy, i.e. furtherfrom the storage devices) in order to manage and control a storage arrayas well as provide information to the SSD controller.

GLOSSARY AND CONVENTIONS

Terms that are special to this field of invention or specific to thisinvention are defined in this description and the first use (and usuallythe definition) of such special terms are highlighted in italics for theconvenience of the reader. Table 1 shows a glossary for the convenienceof the reader. If any information from Table 1 used for claiminterpretation or other purpose conflict with the description text,figures or other tables, then the information in the description shallapply.

In this description there are several figures that depict similarstructures with similar parts or components. For example several figuresshow a disk command. Even though disk commands may be similar in severalfigures, the disk commands are not necessarily identical. Thus, as anexample, to avoid confusion a disk command in FIG. 1 may be labeled“Disk Command (1)” and a similar, but not identical, disk command inFIG. 2 is labeled “Disk Command (2)”, etc.

TABLE 1 Glossary of Terms Array Block Address Combination of D and DBA.(ABA) Block A region of a flash memory (also used for Sector). Clean Aflash page that is not dirty. Device Driver Typically software that iscoupled to a controller. Dirty A flash page that is no longer required(also invalid, obsolete). Disk (D) Identifies a disk (may be HDD orSSD). Disk Block Size (DBS) The block or sector size of a physical disk.Disk Command A command as received by a disk. Disk Controller The logicon a disk (HDD or SSD), as opposed to Storage Array Controller that isseparate from a disk. Disk Logical Block The LBA that identifies thesector or block on Address (DBA) a disk. Disk Sector A region of a disk(e.g. 512 bytes). See also Sector. Disk trim Command Trim Commandreceived by a disk (see also Trim Command). Field Part of a datastructure. Flash Block Part of a flash memory chip. Flash blocks containflash pages. Flash Page Part of a flash memory chip. Free Block (FB) Ablock (e.g. ABA) that is free (unused) and ready for use. FreeSuperblock (FSB) A superblock in which all blocks are free (unused)blocks. Freelist A list of free (i.e. unused) blocks or sectors (e.g.LBAs, ABAs). Garbage (G) A value in a data structure that indicates ablock or sector is ready to be erased. Garbage Collection (GC)Relocating data to new locations and erasing the old locations. Copyingflash pages to new flash blocks and erasing old flash blocks.Granularity An amount of storage (e.g. 512 bytes). Hard Disk (HD) Amechanical disk, also Hard Disk Drive (HDD). Host Block Address The LBAused by the host to address a (HBA) storage array controller. Host BlockSize (HBS) The block or sector size seen by the host. Host Command Thecommands as transmitted by the host. Host Trim Command Trim Commandissued by host system (see also Trim Command). Logical Block Address Theaddress of a Logical Block. (LBA) Logical Blocks A disk sector as seenby the host. Logical Unit Number Identifies a disk or portion of a diskor portion (LUN) of a collection of disks. Map A data structureconverting storage addresses from one layer of storage hierarchy to thenext one. Operating System (OS) Software that runs on a CPU in a hostsystem (e.g. Windows or Linux). Physical Block An address of a physicalregion in flash Number (PBN) memory where data is stored. Physical DiskSector Physical region on a disk where data is stored, typically 512bytes. Random Writes Successive writes to random locations. Sector Aregion of a disk (e.g. 512 bytes). See also Disk Sector. SequentialWrites Successive writes to successive locations. Solid-State Disk (SSD)A disk made (for example) from NAND flash memory. SSD Controller Thedisk controller that is part of an SSD (as opposed to a Storage ArrayController). Storage Array A collection of disks. Storage ArrayController A controller that sits between OS and disks. Storage CommandA read, write etc. directed to a disk. Storage Controller Not used toavoid confusion with Storage Array Controller. See Disk Controller.Storage Driver A layer of software between the file system and disk orother storage device. Superblock (SB) A collection of blocks (e.g. 64MB). Trim Command Tells an SSD which areas may be erased (see also DiskTrim Command). Unmapped (X) A value in a data structure that indicates ablock or sector is not in use by the host system. Used (U) A value in adata structure that indicates a block or sector contains data.

Storage Array Controller

FIG. 1 shows an embodiment of a Storage Array Controller 108 for aStorage Array 148 that includes a Solid-State Disk (1) 116. In FIG. 1,Computer System 150 includes a Host System 102 running Operating System158 and containing a CPU 104 that connects to a Storage Subsystem 146using an 10 Bus 106. In FIG. 1 the Storage Subsystem 146 consists ofStorage Array Controller 108 and the Storage Array 148. In FIG. 1 theStorage Array 148 includes a Solid-State Disk (1) 116 and Other StorageArray Devices 128.

In FIG. 1 the Storage Array Controller 108 contains a Storage ArrayController Chip 110. In FIG. 1 the Storage Array Controller Chip 110contains a Storage Array Controller Logic 112. In FIG. 1 the StorageArray Controller Chip 110 connects to a Storage Bus 114. In FIG. 1 theStorage Bus 114 connects to the Solid-State Disk (1) 116 and OtherStorage Array Devices 128. In FIG. 1 the Other Storage Array Devices 128consist of: Solid-State Disk (2) 152, Hard Disk (1) 154, and Hard Disk(2) 156. In FIG. 1 the Other Storage Array Devices 128 may alternativelyconsist of any combination of storage devices, but will typicallyinclude SSDs and/or HDDs.

In FIG. 1 the Solid-State Disk (1) 116 contains a Solid-State DiskController Chip 118 and Flash Memory 122. The Solid-State DiskController Chip 118 contains Solid-State Disk Logic 120. In FIG. 1 theFlash Memory 122 comprises a number of Disk Sectors 134. In FIG. 1 thereare 16 Disk Sectors 134 numbered 00-15: Disk Sector (00) 124 to DiskSector (15) 126. In FIG. 1 there are two Disk Sectors 134 in a FlashPage 130 and four Disk Sectors 134 in a Flash Block 132.

Other topologies for Computer System 150 are possible: CPU 104 mayconnect or be coupled to the IO Bus 106 via a chipset; IO Bus 106 mayuse a serial point-to-point topology and bus technology (such as PCIExpress, InfiniBand, HyperTransport, QPI, etc.), but may also use aparallel and/or multi-drop topology and bus technology (such as PCI,etc.); Storage Bus 114 may use a parallel and/or multi-drop topology andbus technology (such as SCSI, etc.), may use a serial point-to-pointtopology and bus technology (such as SATA, SAS, FC, USB, Light Peak,etc.), or may use a networked protocol (such as iSCSI, FCoE, etc.); thevarious bus technologies used may be standard or proprietary; thevarious bus technologies used may be electrical, optical or wirelessetc.; portions of the system may be integrated together in a single chipor integrated package, and/or portions of the system may be in differentenclosures etc. Many uses for Computer System 150 are possible: a massstorage system, embedded device, etc. Since solid-state storage iswidely used in portable electronic devices, the ideas presented herealso apply when Computer System 150 is a cell phone, PDA, tablet,camera, videocamera, portable music player, other portable electronicdevice, or similar.

An operating system (OS) sees a storage array as a collection of disksectors or just sectors (and sectors may also be called blocks). An SSDin a storage array may have a capacity of more than 100 Gbytes andcontain tens of NAND flash memory chips. A typical 1 Gbit NAND flashmemory chip may contain 1024 flash blocks with each flash blockcontaining 64 flash pages and each flash page containing 2 kbytes. Thenumbers of disk sectors, flash pages and flash blocks in FIG. 1 havebeen greatly reduced from typical values present in commercial productsin order to simplify the description of the Storage Array Controller108.

Disk sectors may be 512 bytes in length (and typically are in the 2010timeframe). In FIG. 1, if the Disk Sectors 134 are 512 bytes each, thenthe Solid-State Disk (1) 116 has 16 Disk Sectors 134 (and thus acapacity of only 8 kbytes); with each Flash Block 132 containing only 2kbytes and each Flash Page 130 containing only 1 kbytes. Thus theexample Solid-State Disk (1) 116 of FIG. 1 is several thousand timessmaller than available in the 2010 timeframe. The algorithms describedhere are independent of the absolute and relative sizes of the disksectors, flash blocks and flash pages.

Note that FIG. 1 is simplified in other aspects also. For example, theremay be more than one CPU 104 and more than one IO Bus 106 in theComputer System 150. The storage array configuration may be differentthan shown in FIG. 1. For example, the Other Storage Array Devices 128may include hard-disk drives, solid-state disk drives, other storagedevices such as storage cards, keys, etc. or other forms of storagemedia such as optical devices, mechanical devices, etc. There may bemore than one Solid-State Disk (1) 116 in the Storage Array 148. In FIG.1 the Disk Sectors 134 are simplified and shown as if they were separatecomponents, but typically solid-state disks consist of many NAND flashchips and components each of which contain many (millions) disk sectorsor flash blocks. Solid-State Disk (1) 116 may be in a form-factor thatis a drop-in replacement for a hard-disk (3.5″, 2.5″ form factors, etc.)or may be in any other form-factor or with any interface (Compact FlashCF, MultiMediaCard MMC, miniSD, Memory Stick, SmartMedia, TransFlash,Secure Digital SD, PCI Express Card, etc.)

We now explain the algorithms of the Storage Array Controller 108.

Algorithm 1: a Storage Array Controller that Issues a Trim Command

FIG. 1 shows details of the Storage Array Controller Logic 112 in theStorage Array Controller 108. The Storage Array Controller Logic 112includes two data structures: a Map (1) 136 and a Freelist (1) 138. Themap contains fields: HBA, ABA, LUN, S. The freelist contains a list offree block ABAs (FB). First, these fields will be described along withother data that may be used by the Storage Array Controller Logic 112,but that is not shown in FIG. 1 for clarity. The map and freelist datastructures will then be described in detail.

The sectors or blocks of a storage device are addressed as logicalblocks using a logical block address (LBA). To avoid confusion, we willuse host block address (HBA) for the LBA used to address a storage arraycontroller. Unless we explicitly state otherwise, we assume that thehost block size (HBS) is equal to the disk block size (DBS). The HBA maybe a composite or union of a logical unit number (LUN) that identifies alogical portion of the storage array or disk or other device in thestorage array; an LBA; the virtual machine (VM), if any; a UserID thatidentifies the user application; a VolumeID that identifies a logicaltarget volume; and other data that may be used for logical access ormanagement purposes. Note that to simplify the description, clarify thefigures, and in particular to make it clear that operations may beperformed on different LUNs, the LUN may be shown separately from HBA inFIG. 1 and in other figures. A disk number (D) identifies a disk orother storage device in the storage array. A disk logical block address(DBA) is the LBA that identifies the disk sector on the disk or otherstorage device. An array block address (ABA) is a composite or union ofD and DBA, written <D, DBA>. Note that the storage array does not haveto be a RAID array, JBOD, or any other particular type of storage array,but can be. The status field (S) holds the status of the disk sectorcorresponding to the HBA. Field S uses codes for used (U); unmapped (X);and garbage (G), i.e. ready for garbage collection. We will describe theterms garbage and garbage collection in detail shortly. Field S may useother codes, or other functions for the codes, but for clarity no othercodes are shown in FIG. 1. The free blocks (FB) in Freelist (1) 138 areABAs that are free for use.

A disk controller for an HDD or SSD maintains the relationship betweenan ABA (or the DBA portion of the ABA) and the disk sectors that arephysically part of a storage device (often called the physical disksectors or physical sectors). In exactly the same way the Solid-StateDisk Logic 120 maintains the relationship between an ABA and thephysical block number (PBN) of an SSD. The PBN of an SSD is analogous tothe physical disk sector of an HDD. Due to resource constraints SSDsoften manage the PBNs at a coarser granularity than disk sectors.Normally a disk command contains an LBA provided by the host, but in thepresence of a storage array controller the disk command contains an ABAprovided by the storage array controller. Note that in FIG. 1 there are16 Disk Sectors 134 numbered from Disk Sector (00) 124 to Disk Sector(15) 126 on the Solid-State Disk (1) 116. There are 16 ABAs thatcorrespond to these 16 disk sectors, but the Solid-State Disk Logic 120continuously changes the relationship between the ABAs and the disksectors. In the example of FIG. 1, 12 of the 16 possible HBAs are in Map(1) 136 and four HBAs (02, 13, 14, 15) are on the Freelist (1) 138.

Because the terms just described can be confusing we summarize the aboveagain briefly. With just a single disk, the host provides an LBAdirectly to the disk; the disk controller converts the LBA to thephysical disk sector (for an HDD) or to the PBN (for an SSD). In thepresence of a storage array controller the host still provides an LBA,but now to the storage array controller (and thus we call the LBA an HBAto avoid confusion); the storage array controller then maps this HBA toan ABA and provides the ABA (or possibly just the DBA portion of theABA) to the disk; the disk (HDD or SSD) then converts this DBA or ABA(treating the DBA portion of the ABA as though it were just an LBA,which it is) to a physical disk address: either the physical disk sector(for an HDD) or PBN (for an SSD).

It is important to understand the additional layer of hierarchy that astorage array controller introduces. The storage hierarchy of FIG. 1 hasthe following layers: (i) Operating System 158; (ii) Storage ArrayController 108; (iii) Storage Array 148. In FIG. 1 the Storage ArrayController 108 has a higher position in the hierarchy than Solid-StateDisk (1) 116, i.e. is further from the storage devices. In FIG. 1 theStorage Array Controller 108 adds a level of indirection (i.e. adds amap or re-map of data) between Host System 102 and Storage Array 148. InFIG. 1 the Storage Array Controller 108 may also add additionalresources over and above that of Solid-State Disk (1) 116.

We will define structures and their functions, operations and algorithmsin terms of software operations, code and pseudo-code, it should benoted that the algorithms may be performed in hardware; software;firmware; microcode; a combination of hardware, software, firmware ormicrocode; or in any other manner that performs the same function and/orhas the same effect. The data structures, or parts of them, may bestored in the storage array controller in SRAM, DRAM, embedded flash, orother memory. The data structures, or parts of them, may also be storedoutside the storage array controller, for example on any of the storagedevices of a storage array (the local storage or remote storage, i.e.remote from the storage array connected to the storage array controller)or on a host system (the local host or a remote host, i.e. remote fromthe host connected to the storage array controller). For example, FIG. 1shows the Storage Array Controller 108 containing a Storage ArrayController Chip 110 and the Storage Array Controller Logic 112.Alternative implementations are possible: (i) the Storage ArrayController Logic 112 may be completely in hardware, completely insoftware, or partly hardware and partly software, and may be in anylocation, on the host or remote, for example (ii) the Storage ArrayController Logic 112 may not physically be in the Storage ArrayController Chip 110 or in the Storage Array Controller 108 (iii) theStorage Array Controller Chip 110 may be implemented as a chip, an ASIC,an FPGA or equivalent, a combination of such components, or may be acombination of hardware and software; (iv) the Storage Array ControllerChip 110 may be a portion (or portions) of a larger chipset, IOcontroller, processor, etc. A part of this invention is the logicalplacement of the storage array controller functions and algorithmsbetween operating system and a storage array.

We will now define the data structures (including the map and thefreelist) that we will use. A map hr_map is defined between the HBAs andABAs as hr_map[hba]->aba. Thus hr_map takes an HBA as input and returnsan ABA. We say that the HBA maps to that ABA (we can also say that thestorage array controller maps or re-maps data from the operatingsystem). A special symbol or bit (for example, we have used X in the Map(1) 136 of FIG. 1) may indicate that an entry in hr_map[hba] isunmapped, and/or we can use a special table entry (for example, we haveused a LUN of zero in the Map (1) 136 of FIG. 1) to indicate an entry inhr_map[hba] is unmapped. The Freelist (1) 138 uses a structure aba_free.Note that Map (1) 136 in FIG. 1 is used to map from HBA to ABA for everyhost command that addresses a storage device: reads, writes, etc. Thisis true for all of the maps in the examples described here.

We have used the term storage array controller throughout thisdescription rather than storage controller or disk controller. In FIG. 1the Storage Array Controller 108 is separate from any disk controllersthat are part of the storage devices that form the Storage Array 148. Asshown in FIG. 1, the Storage Array Controller 108, the Storage ArrayController Chip 110, and the Storage Array Controller Logic 112 are allseparate from the Solid-State Disk Controller Chip 118 and Solid-StateDisk Logic 120 typically used by the Solid-State Disk (1) 116.

A storage command is directed to a storage device and specifies anoperation, such as read, write, etc. A storage command is more commonlycalled a disk command or just command, a term we will avoid using inisolation to avoid confusion. To avoid such confusion we will usestorage command when we are talking about commands in general; but wewill save disk command (or disk write, etc.) for the command as itarrives at (or is received by) the disk (either SSD or HDD, usually viaa standard interface or storage bus, such as SATA); we will use the termhost command (or host write, etc.) for the command as it leaves (or istransmitted by) the OS. A disk command may be the same as a host commandwhen there is a direct connection between the OS on a host system and asingle disk.

The algorithms and operations described below use a disk trim command(trim command or just trim are also commonly used). A disk trim commandwas proposed to the disk drive industry in the 2007 timeframe andintroduced in the 2009 timeframe. One such disk trim command is astandard storage command, part of the ATA interface standard, and isintended for use with an SSD. A disk trim command is issued to the SSD;the disk trim command specifies a number of disk sectors on the SSDusing data ranges and LBAs (or, as we have explained already, using ABAsor the DBAs contained in ABAs in the presence of a storage arraycontroller); and the disk trim command is directed to the specified disksectors. The disk trim command allows an OS to tell an SSD that the disksectors specified in the trim command are no longer required and may bedeleted or erased. The disk trim command allows the SSD to increaseperformance by executing housekeeping functions, such as erasing flashblocks, that the SSD could not otherwise execute without the informationin the disk trim command.

It should be noted from the above explanation and our earlier discussionof ABAs that, for example, when we say “place an ABA in a disk trimcommand,” the disk trim command may actually require an LBA (if it is astandard ATA command for example), and that LBA is the DBA portion ofthe ABA. To simplify the description we may thus refer to an LBA, DBAand ABA as referring to the same block address, and thus mean the samething, at the disk level.

Although the disk trim command and other storage commands have fixed andwell-specified formats, in practice they may be complicated with manylong fields and complex appearance. Storage commands may also vary informat depending on the type of storage bus, for example. We willsimplify storage commands and other commands in the figures in order tosimplify the description (and the format of the storage commands mayalso vary between different figures and different examples). Thealgorithms described here are intended to work with any standard orproprietary command set even though a command shown in a figure in thisdescription may not exactly follow any one standard format, for example.

We now describe Algorithm 1 that allows the Storage Array Controller 108of FIG. 1, rather than Operating System 158, to autonomously issue adisk trim command that is directed to unused disk sectors on theSolid-State Disk (1) 116. In FIG. 1 we have used a large arrow to depictand show the flow of an Autonomous Disk Trim Command 144 between StorageArray Controller 108 and Solid-State Disk (1) 116 (and will use thissame depiction in other figures).

We say the Storage Array Controller 108 autonomously issues the disktrim command or issues the disk trim command in an autonomous fashion orin an autonomous manner, or issues autonomous disk trim commands. We usethe term autonomous or autonomously here to describe the fact that it isthe Storage Array Controller 108 that initiates, originates, orinstigates the disk trim command and generates or creates the contentsof all (or part) of the disk trim command rather than, for example,Operating System 158 on Host System 102.

Algorithm 1 may be used in a situation where Operating System 158 onHost System 102 does not support the disk trim command (or does notsupport the disk trim operation). Algorithm 1 may also be used in asituation where Operating System 158 on Host System 102 is unaware ofthe physical details of the Storage Array 148. Algorithm 1 may be used,for example, in the situation where the sum capacity of the LUNspresented to Operating System 158 on Host System 102 is smaller than thesum capacity of the Storage Array 148. This situation may occur, as anexample, because an OS is in a virtual machine and the storage array isbeing shared by multiple virtual machines. There are, however, manyreasons, including the use of storage management; use of a Guest OS,virtualization of machines; remote, NAS and SAN storage arrays; storagevirtualization; and other datacenter functions that may cause OperatingSystem 158 on Host System 102 to be unable to, or unaware that it can,issue a disk trim command to a Solid-State Disk (1) 116 in the attachedStorage Array 148.

Algorithm 1: trim_aba

Step 1. Assume valid HBAs map to a fixed subset of ABAs in hr_mapStep 2. Issue a disk trim command to ABAs in aba_free that are notmapped to by valid HBAs

In FIG. 1 the Freelist (1) 138 in the Storage Array Controller Logic 112contains ABAs 02, 13, 14, and 15 in aba_free (and these ABAs aretherefore not present in Map (1) 136). The Storage Array Controller 108may use Algorithm 1 to autonomously issue a disk trim command toSolid-State Disk (1) 116 as shown in FIG. 1 by Disk Commands (1) 140.The Disk Trim Command (1) 142 contains: RCMD#, the command number; RCMD,the command (T for trim); and four ABA data range fields (ABA1-ABA4)specifying the ABAs 02, 13, 14, 15. The information in Disk Trim Command(1) 142 may then be used by Solid-State Disk (1) 116.

Note that Disk Trim Command (1) 142 shows the same information contentthat an industry-standard disk trim command contains; but, is notnecessarily in the exact format used, for example, by the ATA industrystandard.

Note that alternative implementations for Algorithm 1 may include thefollowing: (i) multiple disk trim commands may be combined; (ii) ifOperating System 158 in FIG. 1 supports a trim command, then one or morehost trim commands from the Host System 102 may be combined or mergedwith one or more trim commands autonomously generated by Storage ArrayController 108 to form the disk trim command(s) (we are careful todistinguish host trim commands, which are from the host, from disk trimcommands that are received by the disk); (iii) the map hr_map may becompressed or condensed by mapping regions larger than a disk sector(e.g. a LUN); (iv) the map hr_map may be compressed or condensed byusing groups or collections of ABAs rather than individual ABAs; (v) anyof the alternative implementations of the other algorithms in thisdescription.

One feature of Algorithm 1 is for a storage array controller to setaside, as unused, a portion (or portions) of an SSD (or SSDs) in astorage array. Thus the sum of the LUNs presented to the host system issmaller than the capacity of the storage array. The storage arraycontroller may then autonomously issue disk trim command(s) to theunused portion(s) of an SSD (or SSDs). An SSD may then use theinformation in the disk trim command to erase or delete flash blocks.The ability to erase or delete flash blocks improves the SSD performanceand improves the SSD reliability.

It is important to note that the Storage Array Controller Logic 112 is(i) separate from the Solid-State Disk Logic 120 typically used by theSolid-State Disk Controller Chip 118 and (ii) separate from OperatingSystem 158.

A storage array controller performs certain functions instead of (or inaddition to) an OS running on a host system; and a storage arraycontroller also performs certain functions instead of (or in additionto) an SSD controller(s) in a storage array. A storage array controlleris logically located between a host system and an SSD. An SSD containsits own SSD controller, but a storage array controller may have moreresources than an SSD controller. The algorithms described here allow astorage array controller to use resources, such as larger memory size,non-volatile memory, etc. as well as unique information (because astorage array controller is higher than an SSD controller in a storagearray hierarchy, i.e. further from the storage devices) in order tomanage and control a storage array as well as provide information to anSSD controller. For example, a storage array controller is aware of LUNsbut a SSD controller is not. This hierarchical management approach hasother advantages and potential uses that are explained throughout thisdescription in the forms of various algorithms that may be employed bythemselves or in combination.

Algorithm 1 illustrates the operation of the Storage Array ControllerLogic 112 in the Storage Array Controller 108. The description ofAlgorithm 1 is useful before we describe more complex algorithms thatinclude host write commands and other storage array functions. Thesemore complex algorithms show how Freelist (1) 138 in FIG. 1 is generatedand how Map (1) 136 is changed. Before we discuss these other algorithmswe will describe alternative implementations of Algorithm 1.

Alternative Implementations and Structures

FIGS. 2A-2C show alternative implementations and alternative structureswith reference to Algorithm 1.

FIG. 2A shows a serial storage bus. The main elements of FIG. 2A aresimilar to those of FIG. 1. In FIG. 2A we have shown the Serial StorageBus (1) 214 as a serial point-to-point bus (in contrast to the parallelmulti-drop bus of FIG. 1). In FIG. 2A Solid-State Disk (3) 230 has aserial interface to a Serial Storage Bus (1) 214 (and thus we have givenit a different label than in FIG. 1 where Solid-State Disk (1) 116 had adifferent, parallel, interface). In FIG. 2A Other Storage Array Devices128 are linked to the Storage Array Controller Logic 112 by SerialStorage Bus (2) 216 and Serial Storage Bus (3) 218. In FIG. 2A the OtherStorage Array Devices 128 consist of: Solid-State Disk (4) 232 and HardDisk (3) 234. In FIG. 2A the Storage Array Controller 108 issues anAutonomous Disk Trim Command 144 as described previously with referenceto FIG. 1.

Note that the various storage-array configuration alternatives as wellas other various possibilities for the storage array configuration(s),storage bus(es), and various storage device(s) will not necessarily beshown in all of the figures in order to simplify the description.

FIG. 2B shows a device driver. A device driver is typically (though notnecessarily) software that may be (but not necessarily) manufacturedwith and sold with a storage array controller. (In differentimplementations the device driver may be implemented in software,hardware, firmware or a combination; and may be designed, manufacturedand/or sold separately.) The main elements of FIG. 2B are similar tothose of FIG. 1. FIG. 2B does not show other storage devices in astorage array, but they could be present as was shown in FIG. 2A. InFIG. 2B Computer System 150 includes Host System 102 containing a CPU104 that runs Software 238. In FIG. 2B Software 238 includes: OperatingSystem 158, File System 226 and Device Driver 228. In FIG. 2B DeviceDriver 228 is connected to IO Bus 106 via Software Bus 240 (shown asdashed to represent the fact that the software-to-hardware connection isa logical connection or coupling and not a direct electricalconnection). In FIG. 2B Device Driver 228 includes Device Driver Logic236. In FIG. 2B the Device Driver 228 is separate from Operating System158. In FIG. 2B the Device Driver 228 is logically connected or coupledto Storage Array Controller 108. In FIG. 2B Device Driver Logic 236 islogically part of part of Storage Array Controller Logic 112. In FIG. 2BDevice Driver Logic 236, logically connected or coupled to Storage ArrayController Logic 112, issues the Autonomous Disk Trim Command 144.

FIG. 2C shows a computer system with multiple virtual machines (VMs),each VM containing an operating system, and a hypervisor. FIG. 2C doesnot show other storage devices in a storage array, but they could bepresent. In FIG. 2C there are two operating systems (or more than two,as shown figuratively by the dots) running as VMs in CPU 104: OperatingSystem 1 may be a Host OS and Operating System 2 may be a Guest OS, forexample. In FIG. 2C each operating system has a file system and astorage driver (and possibly more than one storage driver). The filesystem (sometimes considered part of an OS) translates or converts fromfile-based access (in terms of directories, file names and offsets) todisk-based access (in terms of LBAs). The storage driver (sometimesconsidered part of an OS) is responsible for handling a disk or otherstorage device. The storage driver is usually (but not always) separateand distinct from Device Driver 228. In FIG. 2C Device Driver 228 ispart of Hypervisor 242 and logically connected or coupled to storagedrivers through Software Bus (2) 244. In FIG. 2C Device Driver 228contains Device Driver Logic 236. In FIG. 2C Device Driver Logic 236,logically connected or coupled to Storage Array Controller Logic 112,issues the Autonomous Disk Trim Command 144.

FIG. 2D shows a computer system that is typical of the WindowsHypervisor, Virtualization Stack and Device Virtualization architecturesfrom Microsoft Corporation. In FIG. 2D we have shown the Device Driver228 of FIG. 2C as a hyperdriver, a general term that we will use todenote a device driver in a hypervisor. In FIG. 2D the Device DriverLogic 236 is part of Hyperdriver 246. In FIG. 2D the Hyperdriver 246,logically connected or coupled to Storage Array Controller Logic 112,issues the Autonomous Disk Trim Command 144. In FIG. 2D the DeviceDriver Logic 236 may also be implemented in the Parent Partition as partof the Kernel. In such an implementation the Autonomous Disk TrimCommand 144 originates in the Kernel.

FIG. 2E shows a computer system that is typical of the Microsoft Hyper-Varchitecture showing Virtualization Service Providers (VSPs) andVirtualization Service Consumers (VSCs). In FIG. 2E the Hyperdriver 246,logically connected or coupled to Storage Array Controller Logic 112,issues the Autonomous Disk Trim Command 144. In FIG. 2E the DeviceDriver Logic 236 may also be implemented in the Parent Partition. Insuch an implementation the Autonomous Disk Trim Command 144 originatesin the Parent Partition.

FIG. 2F shows a computer system that is typical of the ESX productavailable from VMWare and contains a Virtual Machine Kernel (VMkernel)and Virtual Machine Host-Bus Adapter (VMHBA). In FIG. 2F the DeviceDriver Logic 236 is part of Hyperdriver 246. In FIG. 2F the Hyperdriver246, logically connected or coupled to Storage Array Controller Logic112, issues the Autonomous Disk Trim Command 144.

Note that the Device Driver 228 (and thus Device Driver Logic 236) andStorage Array Controller 108 (and thus Storage Array Controller Chip 110and Storage Array Controller Logic 112) are: (i) separate from theSolid-State Disk Logic 120 used by the Solid-State Disk Controller Chip118 and (ii) separate from Operating System 158 (or storage-driversoftware that may be considered part of Operating System 158).

Note that in the following examples and implementations we may simplifydescriptions by showing Storage Array Controller 108 (with Storage ArrayController Chip 110 and Storage Array Controller Logic 112) as issuingthe autonomous disk trim command (just as we described with reference toFIG. 1). It should now be clear from the description provided withreference to FIG. 2B-2F that a device driver, hyperdriver, or othersoftware may also be used in any of the implementations that aredescribed here. Thus, when we refer to Storage Array Controller Logic112 (implemented in hardware, software, firmware, or a combination ofthese, in Storage Array Controller Chip 110) performing some function,recognize that the function may be performed by a combination of StorageArray Controller Logic 112 and Device Driver Logic 236 (implemented inhardware, software, firmware, or a combination of these).

Algorithm 2: Storage Array Controller that Maintains a Map and aFreelist

We will now describe Algorithm 2 that builds on Algorithm 1 and thatshows how a freelist and map are used. FIG. 3 shows an example of astorage array controller that autonomously issues a disk trim command.The main elements of FIG. 3 are similar to FIG. 1. In FIG. 3 the HostWrite Commands (3) 300 include: HCMD#, the host command number; HCMD,the host command (in this example all host commands are W or writes);HBA, the host LBA; LUN; HDATA, the data in the host command. Note thatwe have stylized the write data as characters G-J to simplify thedescription. Note that the Host Write Commands (3) 300 are stylized andsimplified versions of what a particular host command, in this case awrite command, may look like, with the same information content that anindustry-standard write command contains (e.g. in a commercialembodiment following an industry standard); but is not necessarily inthe exact format used, for example, by the ATA industry standard. Notealso that there are other disk commands and other host commands thanwrite commands; and that other forms of all such commands are alsopossible. In FIG. 3 the writes in Disk Commands (3) 306 are generatedfrom the Host Write Commands (3) 300, but use ABAs instead of HBAs. TheStorage Array Controller 108 maps from the HBAs to the ABAs using theStorage Array Controller Logic 112. Additional elements in FIG. 3illustrate the various states of the data structures that we will useand describe in detail below.

  Algorithm 2: get_write_aba_with_trim(hba) // Get the old HBA; Get anew ABA; Update the map old_aba = hr_map[hba] new_aba = aba_free.pop( )hr_map[hba] = new_aba // issue a disk trim command to the old ABA;Update the freelist if (old_aba != empty) then trim(old_aba) //Algorithm 1 or equivalent if (old_aba != empty) thenaba_free.push(old_aba)

Alternative implementations of Algorithm 2 may include some or all ofthe following: (i) an asynchronous disk trim command (i.e. the disk trimcommand is generated at a different time to that described above and toother events); (ii) a disk trim command may specify multiple disksectors (using multiple data ranges); (iii) any type of storage arrayincluding one or more SSDs; (iv) any of the alternative implementationsof the other algorithms in this description; (v) ordering the freelistto increase the likelihood that writes to the SSD are to sequential ABAs(even though the HBAs may be to random addresses)

FIG. 3 shows the Map (3a) 302 before any of the writes in the figure areexecuted and the Map (3b) 304 after the last write shown is executed(i.e. the label “3b” denotes a later time than the label “3a”). In FIG.3 the Host System 102 sees eight disk sectors, the map contains eightHBAs, and the freelist contains eight ABAs (small numbers are used tosimplify the description). In FIG. 3 the Host Write Commands (3) 300will be to HBAs 00, 06, 05, 01. In FIG. 3, the Map (3a) 302 shows HBAs00, 06, 05, 01 are used (U) and the corresponding used ABAs are 00, 01,03, 05. In FIG. 3, the Freelist (3a) 308 contains ABAs 08, 09, 10, 11,12, 13, 14, 15. The four writes HCMD#1-4 then execute. As a result ofthe four writes, the four old ABAs 00, 01, 03, 05 in Map (3b) 304 arereplaced with four new ABAs: the first four blocks from the freelist,08, 09, 10, 11.

An old array block address (old ABA) is thus an ABA that is no longerrequired, containing data that is no longer useful or required; and anew ABA is an ABA, taken from a freelist, that replaces an old ABA anddoes contain data that is useful or required.

In FIG. 3 the first command, Disk Write Command (3) 318, with RCMD#=1 inthe Disk Commands (3) 306 is a sequential write of data G-J to ABAs08-11. Disk Trim Command (3) 320, with RCMD#=2, is then autonomouslyissued by the Storage Array Controller 108 specifying the old ABAs 00,01, 03, 05. In FIG. 3 the Freelist (3b) 310 now contains ABAs 12, 13,14, 15, 00, 01, 03, 05 (we have left these ABAs unordered to show moreclearly the order in which ABAs were added).

Typically an erase of Flash Memory 122 is performed a block at a time,as shown by E in the Erased Flash Block 312 in FIG. 3. Typically a writeto Flash Memory 122 is performed a page at a time, as shown by W in theWrite to Flash Page 314 in FIG. 3. As a result of Disk Trim Command (3)320, Solid-State Disk (1) 116 may now perform housekeeping (i.e. delete,free, erase, garbage collection, etc. on flash blocks). For example,Solid-State Disk Logic 120 may contain Solid-State Disk Data Structure316, which shows that a physical flash block corresponding to ABAs 00,01, 03, 05 and consisting of disk sectors 04, 05, 06, 07 (marked Y forYes, in the trim field) may be erased as a result of Disk Trim Command(3) 320. By autonomously issuing disk trim commands, the Storage ArrayController 108 allows the Solid-State Disk (1) 116 to increase theefficiency of write and erase operations.

One feature of Algorithm 2 is for a storage array controller to maintaina map (i.e. map or re-map data) between host and disk(s) and toautonomously issue disk trim commands to the SSD(s) directed to oldABAs.

Algorithm 3: Storage Array Controller that Performs Garbage Collection

We will now describe Algorithm 3, which is based on Algorithm 2, andthat operates on large groups of sectors called superblocks. FIG. 4shows the Flow Chart 400 of an algorithm, as well as the associated DataStructures 402, for a write loop that performs garbage collection; usesa superblock for writes; uses a superblock freelist containing freesuperblocks; and autonomously issues disk trim commands to superblocks.This write loop forms part of an implementation of the Storage ArrayController Logic 112 in Storage Array Controller 108. We will also useAlgorithm 3 presently as part of a more complex algorithm.

First we describe garbage collection. In the context of solid-statestorage, typically flash memory, when a flash page (or some otherportion) of a storage device is no longer required (i.e. it is obsolete,no longer valid, or is invalid) that flash page is marked as dirty. Whenan entire flash block (typically between 16 to 256 flash pages) isdirty, the entire flash block is erased and free space reclaimed. Iffree space on the device is low, a flash block is chosen that has somedirty flash pages and some clean (i.e. pages that are not dirty, aregood, or valid) flash pages. The clean flash pages are transferred (i.e.written, moved or copied) to a new flash block. All the original cleanflash pages are marked as dirty and the old flash block is erased. Inthe context of solid-state storage, this process of transferring flashpages to new flash blocks and erasing old flash blocks is called garbagecollection. The exact technique used for garbage collection, well-knownto someone skilled in the art, is not a key part of the algorithmsdescribed here. One key idea is that garbage collection is beingperformed by the storage array controller. We present Algorithm 3 firstand then describe each of the steps.

Algorithm 3: get_write_aba_with_GC(hba)

Step 3.0: Write Loop. Process input host write commands. Go to Step 3.1.Step 3.1. Host write command arrives at storage array controller.Storage array controller adds the host write command fields (HBA plusHDATA) to a superblock write buffer. Go to Step 3.2.Step 3.2. Check if the superblock write buffer is full. No: Go to Step3.1. Yes: Go to Step 3.3.Step 3.3. Check if we have enough ABAs in the freelist to fill a freesuperblock.

No: Go to Step 3.4. Yes: Go to Step 3.5.

Step 3.4. Perform freelist_tidy to create a free superblock. Go to Step3.5.Step 3.5. Update hr_map. Go to Step 3.6.//Similar to Algorithm 2 orequivalentStep 3.6. Write the entire superblock to disk. Go to Step 3.7.Step 3.7. End of Write Loop. Go to Step 3.0.

We will now describe the steps in Algorithm 3 and the data structuresshown in FIG. 4 in more detail.

Step 3.1 details: In FIG. 4 the Superblock Write Buffer 406 holds theHBA and HDATA for multiple host write commands. In FIG. 4 Host WriteCommand (4) 404 is a single host write command to HBA=01 with HDATA=G.In FIG. 4 we have stylized the write data to simplify the descriptionwith characters G-N each representing a disk sector of data. In oneiteration through Step 3.1, as shown by the arrow labeled Step 3.1 inFIG. 4, HDATA=G has been added to Superblock Write Buffer 406.

Step 3.2 details: In FIG. 4 the Superblock Write Buffer 406 holds eightdisk sectors of data and is full (with HDATA=G−N, or eight disk sectorsof our stylized data) and thus we will next go to Step 3.3.

Step 3.3 details: FIG. 4 shows Freelist (4) 416 contains 48 ABAs(ordered by ABA). In FIG. 4 blanks in the tabular representation of theFreelist (4) 416 highlight the ABAs that are missing from contiguousranges. In FIG. 4 we can thus see the blanks correspond to ABAs 05, 18,22, 26, 41 that are not on Freelist (4) 416 because they are in Map (4)412 (ABAs 05, 18, 22, 41 are shown; ABA 26 is not). In FIG. 4 a FreeSuperblock (1) 414 always contains eight ABAs that: (i) are contiguous(i.e. sequential and in a continuous range); (ii) start with an ABA thatis aligned to a superblock boundary (i.e. the starting ABA is a multipleof eight, we also call this the superblock address); (iii) are locatedon the same disk. In FIG. 4 the eight ABAs in a Free Superblock (1) 414correspond to eight disk sectors of data and to the size of SuperblockWrite Buffer 406. FIG. 4 shows that Freelist (4) 416 contains FreeSuperblock (1) 414 (ABAs 08-15), thus we will go to Step 3.5 next. Thearrow labeled Step 3.3 in FIG. 4 shows that Free Superblock (1) 414 withstarting address ABA 08 is used in the ABA field in Disk Write Command(4) 410.

Step 3.4 details: freelist_tidy performs garbage collection to produce afree superblock. In Map (4) 412 HBA 04 is marked for garbage collectionwith S=G. The garbage collection process in freelist_tidy can thus addABA 05 to Freelist (4) 416 (as shown by the arrow labeled Step 3.4a inFIG. 4). When ABA 05 is added to Freelist (4) 416 a free superblock willbe created that contains ABAs 00-07 (this step is not shown in FIG. 4).To illustrate the process we have shown Free Superblock (2) 418 that hasalready been created. Free Superblock (2) 418 contains HBAs 32-39 and asa result we can autonomously issue Disk Trim Command (4) 420 directed ata superblock of eight ABAs (shown by the arrow labeled Step 3.4b in FIG.4).

Step 3.5 details: To describe how we update map hr_map we focus on thefirst entry in Superblock Write Buffer 406 (corresponding to Host WriteCommand (4) 404 to HBA=01) in FIG. 4. We see from Map (4) 412 in FIG. 4that HBA 01 is currently mapped to ABA 22 (in the row labeled Step 3.5).We take the first ABA from Free Superblock (1) 414 as a new ABA (fromFIG. 4, this is ABA 08, new aba=08). We will update hr_map to map HBA 01to new ABA 08 (this is not shown in FIG. 4). We will mark new ABA 08with S=U (this is not shown in FIG. 4). We will mark old ABA 22 with S=G(old aba=22, this is not shown in FIG. 4). We then continue updatinghr_map with the next write (from FIG. 4, the next write is to HBA=03)and so on. The map update process was described in Algorithm 2.

Step 3.6 details: In FIG. 4 the Disk Write Command (4) 410 is a writecommand to ABAs 00-07 and contains a superblock of data G−N fromSuperblock Write Buffer 406 (as shown by the arrow labeled Step 3.6 inFIG. 4).

Alternative implementations for Algorithm 3 may include one or more ofthe following: (i) Step 3.4 freelist_tidy may be performedasynchronously (i.e. at a different time) to any write commands so thatat most times (and preferably at all times) there is at least one freesuperblock; (ii) in practice a superblock (and free superblock) will bemuch larger than the disk sector size, flash block size, or flash pagesize and could be 32 Mbytes, or more, for example; (iii) if the SSDcapacity is 100 Gbyte and a superblock is 1 Gbyte, then to avoid fillingthe disk we might inform the OS that the SSD capacity is 99 Gbyte forexample; (iv) a superblock may contain elements at any granularity orsize: for example an element may be a disk sector (512 bytes, forexample); but an element may be larger or smaller than 512 bytes, and anelement may be larger or smaller than a disk sector; (v) any type ofstorage array containing one or more SSDs; (vi) any of the alternativeimplementations of the other algorithms in this description.

As a side note the reader is cautioned that superblock is used in othercontexts (filesystems and NAND flash being examples), but that thecontexts are close enough that confusion might result if not for thiswarning. The superblock described here is a collection of disk sectors(block being a common alternative term for disk sector).

The ideas of Algorithm 3 include that a storage array controller: (i)maintains a map between host and disk (i.e. maps or re-maps data), (ii)performs garbage collection, and (iii) autonomously issues disk trimcommands directed to superblocks. The storage array controller presentsall write and erase operations (including disk trim commands) to an SSDat the granularity of a superblock and this greatly helps the SSDperform its functions, including the garbage collection process of theSSD. Other implementations of Algorithm 3, with other features, arepossible without altering these ideas.

Storage Array Controller with Asynchronous Garbage Collection

We will now describe Algorithm 4, based on Algorithm 3, and thatcontains the majority of the logic required by a storage arraycontroller. Algorithm 4 includes a detailed implementation of an examplegarbage collection process. Note that many (or indeed any) garbagecollection algorithms may be used. Each major step below is a separatestage of operation: steps 4.1, 4.2, 4.3, 4.4, 4.5, and 4.6 correspondto: (i) initialization of the storage device or array; (ii) creation ofLUNs; (iii) handling of write commands; (iv) deletion of LUNs; (v)increasing LUN size; (vi) decreasing LUN size.

Algorithm 4: Storage_Controller_(—)1

Step 4.1: Initialization: issue disk trim commands to all ABAs on alldisks//Nothing on disk(s)Step 4.2: LUN creation: set LUN_size=C2Step 4.3: Write Loop: while there are write commands:Step 4.3.1: get_write_aba(hba)//pop from aba_free_(—)1 & push toaba_free_(—)2Step 4.3.2: if threshold_reached( ) go to Step 4.3.3 else go to Step4.3.1Step 4.3.3: update aba_free_(—1)( ); go to Step 4.3.1//start using An+3Step 4.4: LUN deletion:Step 4.4.1. Issue disk trim commands to all ABAs that are mapped to theLUNStep 4.4.2. Remove all ABA mappings for the LUN and add the ABAs to thefreelist aba_free_(—)1Step 4.5: LUN increase size: no action requiredStep 4.6: LUN decrease size:Step 4.6.1. Issue a disk trim command specifying all ABAs that aremapped to the LUN region being removedStep 4.6.2. Remove all ABA mappings for the LUN region being removed andadd the ABAs to the freelist aba_free_(—)1

FIG. 5: illustrates the write loop of Step 4.3. The storage arraycontroller in FIG. 5 may use the entire disk capacity so that an SSD mayperform more efficient garbage collection. The main elements of FIG. 5are similar to those of FIG. 1. Several components of FIG. 1 that arenot central to Step 4.3 of Algorithm 4 have been omitted from FIG. 5 forclarity. In FIG. 5 the Sectors 514 are shown in an ordered manner so asto simply the description, but the Storage Array Controller Logic 112may re-order the physical disk sector locations. Thus the Sectors 514shown in the various parts of FIG. 5 should be viewed as logical disksectors rather than physical disk sectors.

In FIG. 5 the Solid-State Disk Capacity 516 is 16 disk sectors (C1). InFIG. 5 the Solid-State Disk LUN Size 518 as reported to the OS is 12disk sectors (C2). In FIG. 5 there are two freelists: the PrimaryFreelist using data structure aba_free_(—)1 and the Secondary Freelistusing aba_free_(—)2. After Step 4.1 and 4.2, Primary Freelist (a) 532contains ABAs 00-11 and Secondary Freelist (a) 534 contains ABAs 11-15.Step 4.3.1 uses free ABAs from the Primary Freelist (pop), but returnsfree ABAs to the Secondary Freelist (push).

In FIG. 5 Freelist 1 (b) 536, ABAs 00-03 have been removed and four disksectors in Area 0 520 (A0) have been written and marked U for used.During these writes Secondary Freelist (b) 538 is unchanged and allowsSolid-State Disk (1) 116 to perform garbage collection more efficientlyon Area 3 526 (A3), marked G for garbage. As we continue to write toArea 0 520, Area 1 522 (A1), and Area 2 524 (A2) (but not to Area 3 526)we return free ABAs to Secondary Freelist. In FIG. 5 ABAs 04-06 havebeen removed from Primary Freelist (c) 540. In FIG. 5 ABAs 00-03 havebeen added to Secondary Freelist (c) 542. In FIG. 5 the area markedDirty Area 528 contains ABAs 00-03 and is marked G. This Dirty Area 528will now remain on Secondary Freelist and allow Solid-State Disk (1) 116to perform its own garbage collection more efficiently.

Next, assume that threshold_reached is now true in Step 4.3.2. Forexample, we can count the ABAs used and set a threshold at four. In FIG.5, as a result of Step 4.3.3, four ABAs 12-15 were removed fromSecondary Freelist (d) 546, and added to Primary Freelist (d) 544 as theClean Area 530 and marked F for free. We continue in this fashion: weadd LBAs to Secondary Freelist one-by-one and later transfer them toPrimary Freelist in a large pool.

One idea of Algorithm 4 is to allow the storage array controller tomanage writing to a large and rotating pool of dirty sectors. The resultis that an SSD controller (under or below the storage array controllerhierarchically, i.e. closer to the storage devices) may perform its ownmore efficient garbage collection and clean large dirty areas of flashblocks and flash pages.

Alternative implementations for Algorithm 4 may include one of more ofthe following: (i) the capacities, the numbers of disk sectors, andsizes of the pools and areas described are many orders of magnitudehigher in practice: C1 may be 100 GB and C2 may be 80 GB for example;(ii) instead of a single LUN C2 we can use multiple LUNs: C2, C3, . . ., Ci, and then Step 4.2 will check that the sum of Ci is less than C1;(iii) other algorithms may be used to set the area of dirty sectors: afixed pool (rather than rotating), or multiple pools, might be used forexample; (iv) other algorithms may be used to set the threshold(s), poolsize(s), and location(s); (v) the freelist(s) may be various relativesizes, split, and maintained in different ways that may improve theefficiency and speed of the algorithm; (vi) in Step 4.3.3 we change touse area An+3 (modulo 4 or the number of areas: thus if we were usingArea 0 (A0), change to Area 3 (A3); from Area 2 (A2) we change to Area 1(A1), etc.) and this example assumes we have four areas, but thealgorithm may use any number of areas; (vii) set the threshold of thetest in Step 4.3.2 by using number of writes performed, by number ofABAs used, or any other method; (viii) Step 4.1 may autonomously issue astandard ATA secure erase command to all disks (this will typically markall ABAs as free, but possibly also erasing SSD wear-leveling and otherhousekeeping data); (ix) Step 4.1 may autonomously issue a secure erasecommand that does not erase wear-leveling data; (x) any of thealternative implementations of the other algorithms in this description.

Storage Array Controller for Large Capacity SSDs

We have presented Algorithms 1, 2, 3, and 4 using small disks asexamples and correspondingly small numbers to simplify the descriptions.We now describe Algorithm 5 as an example of a storage array controllerfor use with one or more solid-state disks using components typical ofthe 2010 timeframe. Algorithm 5 described below may be viewed as acombination of previously described algorithms. This implementation willthus illustrate ideas already described, but in a more realistic andcontemporary context.

FIG. 6 shows the structure of the storage in a 64-Gbyte SSD. The mainelements of FIG. 6 are similar to FIG. 1 and other previous Figures. InFIG. 6 Solid-State Disk (1) 116 contains a Solid-State Disk ControllerChip 118 and Flash Memory 122. In FIG. 6 Flash Memory 122 consists ofeight 64-Gbit Flash Devices 604. The 64-Gbit Flash Devices 604 eachconsist of 2 k (2048) 4-Mbyte Flash Blocks 606. The 4-Mbyte Flash Blocks606 each consist of 512 8-kbyte Flash Pages 608. The 8-kbyte Flash Pages608 each consist of 16 512-byte Disk Sectors 610. Solid-State Disk (1)116 thus contains 16×2 k 32 k (32768) flash blocks; 8×2 k×512 or 8M(8388608) flash pages; and 16×2 k×256×16 or 128M (134217728) disksectors. These are practical numbers for a NAND flash device in the 2010timeframe. For example, the Micron 32-Gbit NAND flash, part numberMT29H32G08GCAH2, contains 8 k 512-kbytes flash blocks, and 128 4-kbyteflash pages per block.

In FIG. 6 the IO Bus 106 communicates a Host Write Command (6) 612 tothe Storage Array Controller 108. In FIG. 6 the Host Write Command (6)612 uses an LBA that addresses 512-byte disk sectors. In FIG. 6therefore, the Storage Array Controller 108 receives commands with a512-byte disk sector granularity. In FIG. 6 the Storage Array ControllerChip 110 and the Storage Array Controller Logic 112 use a Superblock614. In FIG. 6 the Superblock 614 consists of 128 k (131072) 512-byteDisk Sectors so that Superblock 614 is 64 Mbytes. In FIG. 6 the DiskWrite Command (6) 620 contains an ABA address of 0-134217727 aligned toa superblock boundary (a multiple of 128 k) that addresses a 512-bytedisk sector. The Disk Write Command (6) 620 always uses a superblock ofdata in the RDATA field.

In FIG. 6 Map (6) 616 shows a list of ABAs, ordered by HBA. In FIG. 6Map (6) 616 may thus have up to 134217728 rows (neglecting, for themoment, any ABAs on a freelist that we have omitted from FIG. 6 tosimplify the description). Since the integer number 134217728 requires27 binary bits, we may need a 4-byte (32-bit) field to store each of theABA entries. In FIG. 6 Map (6) 616 would require up to 4bytes×134217728, equal to 536,870,912 bytes or about 550 Mbytes, tostore the ABA information. This may be too much data to storeeconomically. Using the concept of a superblock, we can simplify Map (6)616.

FIG. 7 shows how we can simplify the map for a storage array controllerattached to one or more large capacity SSDs. FIG. 7 also illustrates howthe storage array controller performs garbage collection by autonomouslyissuing disk trim commands to superblocks. The main elements of FIG. 7are similar to the main elements in previous Figures. We will usesuperblock address (SBA) for the address of a Superblock (7) 714. InFIG. 7 the Map (7) 716 contains HBAs that are addresses of 512-byte disksectors and contains SBAs that are the addresses of 64-Mbytesuperblocks. In FIG. 7 the Freelist (7) 718 contains 128 superblocks(labeled 000-127). The number of superblocks in the freelist will varywith time. In FIG. 7, at the instant in time shown, Map (7) 716 thuscontains 134217728−(128×131072) or 117440512 rows. In FIG. 7 the Map (7)716 thus contains (2048−128) or 1920 SBAs that are in use. In FIG. 7 theMap (7) 716 is shown containing the ABA field, but the ABA may becalculated using the SBA and an Offset within the SBA:ABA=(SBA×65536)+Offset. The use of superblocks and the SBA allows thestoring and manipulations of Map (7) 716 to be simplified in severalways, well-known to someone skilled in the art, that are not a key partof the ideas presented here, but may allow these ideas to be implementedby other means.

In FIG. 7 the Host Write Command (7) 712 contains HDATA at 512-byte disksector granularity. The storage array controller receives host writecommands until the storage array controller has accumulated a Superblock(7) 714 worth of HDATA in a write buffer. The storage array controllerthen removes a superblock from the Freelist (7) 718. The storage arraycontroller then updates Map (7) 716. The storage array controller thengenerates a Disk Write Command (7) 720 with a superblock of data. Thestorage array controller then performs garbage collection, as we havedescribed above, and possibly moving one or more old superblock(s) tothe freelist. As a result of this garbage collection, the storage arraycontroller may autonomously issue a Disk Trim Command (7) 722 directedto one (or more) old superblock(s) with starting addresses at one (ormore) superblock-aligned ABA(s).

Algorithm 5: Storage_Controller 2//Combination of Algorithm 3 & 4

Step 5.1: Initialization: issue a disk trim command to all ABAs on alldisks//Nothing on diskStep 5.2: LUN creation: set LUN_size=C2//C2<C1=disk capacityStep 5.3: get_write_aba_with_GC(hba)//Use Algorithm 3 or equivalentStep 5.3.0: Write Loop. Process input host write commands. Go to Step5.3.1.Step 5.3.1. Host write command arrives at storage array controller.Storage array controller adds the host write command (HBA plus HDATA) toa write buffer. Go to Step 5.3.2.Step 5.3.2. Check if the superblock write buffer is full. No: Go to Step5.3.1. Yes: Go to Step 5.3.3.Step 5.3.3. Check if we have enough ABAs in the freelist to fill a freesuperblock. No: Go to Step 5.3.4. Yes: Go to Step 5.3.5.Step 5.3.4. Perform freelist_tidy to create a free superblock. Go toStep 5.3.5.Step 5.3.5. Update hr_map. Go to Step 5.3.6.Step 5.3.6. Transmit a disk write command from the superblock writebuffer. Go to Step 5.3.7.Step 5.3.7. End of Write Loop. Go to Step 5.3.0.Step 5.4: LUN deletion:Step 5.4.1. Issue a disk trim command to all ABAs that are mapped to theLUNStep 5.4.2. Remove all ABA mappings for the LUN and add the ABAs to thefreelist aba_free_(—)1Step 5.5: LUN increase size: no action requiredStep 5.6: LUN decrease size:Step 5.6.1. Issue a disk trim command specifying all ABAs that aremapped to the LUN region being removedStep 5.6.2. Remove all ABA mappings for the LUN region being removed andadd the ABAs to the freelist aba_free_(—)1

In FIG. 7 the 64-Mbyte Superblock 614 consists of 16 4-Mbyte FlashBlocks. The 64-Gbit Flash Devices 604 (and NAND flash devices ingeneral) typically only permit erase to be performed a flash block at atime and writes are performed a flash page at a time. In FIG. 7 the DiskTrim Command (6) 622 contains a start ABA address of 0-134217727 that isaligned to a superblock boundary and addresses a 512-byte disk sector.The Disk Trim Command (6) 622 always specifies a superblock. Thus, asshown in FIG. 7, the Solid-State Disk (1) 116 always receives writecommands and trim commands with a superblock granularity, and thus theSSD may perform its own functions (e.g. write, erase, garbagecollection, etc.) much more efficiently.

Alternative implementations for Algorithm 5 may include one or more ofthe following: (i) other sizes of superblock; (ii) multiple superblocksizes; (iii) any type of storage array containing one or more SSDs; (iv)any of the alternative implementations of the other algorithms in thisdescription.

FIG. 8 shows a screenshot of a BIOS Configuration Utility for a storagearray controller. The layout, contents and functions shown areillustrative and other names for the functions may be used, a differentlayout or series of screen layouts may be used, commands may instead beperformed on a Linux or DOS command line or equivalent, etc. In FIG. 8Screen 800 contains the following options for BIOS Configuration Utility802: Initialize 804, Create 806, Remove 808, Re-Size 810. These exampleoptions may correspond, for example, to steps 5.1, 5.2, 5.4, 5.5/5.6 ofAlgorithm 5. Such operations may be performed on one or more LUNs thatare part of a storage array or the entire storage array. Thus part orall of the steps and functions described in the algorithms presentedhere may be performed in BIOS as part of a Configuration Utility or aspart of other software utilities. For example, selecting Initialize 804(by using arrow keys for example) in Screen 800 and pressing the Enterkey may cause a disk trim command to be issued to all solid-state disksin an array. Other BIOS Configuration Utility options may similarlycause one or more disk trim commands to be issued as has been described.

CONCLUSION

Numerous variations and modifications based on the above descriptionwill become apparent to someone with skill in the art once the abovedescription is fully understood. It is intended that the claims thatfollow be interpreted to embrace all such variations and modifications.

REFERENCE SIGNS LIST 102 Host System 104 CPU 106 IO Bus 108 StorageArray Controller 110 Storage Array Controller Chip 112 Storage ArrayController Logic 114 Storage Bus 116 Solid-State Disk (1) 118Solid-State Disk Controller Chip 120 Solid-State Disk Logic 122 FlashMemory 124 Disk Sector (00) 126 Disk Sector (15) 128 Other Storage ArrayDevices 130 Flash Page 132 Flash Block 134 Disk Sectors 136 Map (1) 138Freelist (1) 140 Disk Commands (1) 142 Disk Trim Command (1) 144Autonomous Disk Trim Command 146 Storage Subsystem 148 Storage Array 150Computer System 152 Solid-State Disk (2) 154 Hard Disk (1) 156 Hard Disk(2) 158 Operating System 214 Serial Storage Bus (1) 216 Serial StorageBus (2) 218 Serial Storage Bus (3) 226 File System 228 Device Driver 230Solid-State Disk (3) 232 Solid-State Disk (4) 234 Hard Disk (3) 236Device Driver Logic 238 Software 240 Software Bus 242 Hypervisor 244Software Bus (2) 246 Hyperdriver 248 VMkernel 300 Host Write Commands(3) 302 Map (3a) 304 Map (3b) 306 Disk Commands (3) 308 Freelist (3a)310 Freelist (3b) 312 Erased Flash Block 314 Write to Flash Page 316Solid-State Disk Data Structure 318 Disk Write Command (3) 320 Disk TrimCommand (3) 400 Flow Chart 402 Data Structures 404 Host Write Command(4) 406 Superblock Write Buffer 410 Disk Write Command (4) 412 Map (4)414 Free Superblock (1) 416 Freelist (4) 418 Free Superblock (2) 420Disk Trim Command (4) 514 Sectors 516 Solid-State Disk Capacity 518Solid-State Disk LUN Size 520 Area 0 522 Area 1 524 Area 2 526 Area 3528 Dirty Area 530 Clean Area 532 Primary Freelist (a) 534 SecondaryFreelist (a) 536 Primary Freelist (b) 538 Secondary Freelist (b) 540Primary Freelist (c) 542 Secondary Freelist (c) 544 Primary Freelist (d)546 Secondary Freelist (d) 604 64-Gbit Flash Devices 606 4-Mbyte FlashBlocks 608 8-kbyte Flash Pages 610 512-byte Disk Sectors 612 Host WriteCommand (6) 614 Superblock 616 Map (6) 620 Disk Write Command (6) 622Disk Trim Command (6) 712 Host Write Command (7) 714 Superblock (7) 716Map (7) 718 Freelist (7) 720 Disk Write Command (7) 722 Disk TrimCommand (7) 800 Screen 802 BIOS Configuration Utility 804 Initialize 806Create 808 Remove 810 Re-Size

What we claim is:
 1. A method of managing a storage array comprising: astorage array controller that is operable to receive one or more hostcommands from an operating system; wherein the one or more host commandsare directed to one or more solid-state storage devices in the storagearray; wherein the storage array controller is operable to generate oneor more disk trim commands in response to the one or more host commands;wherein the generating one or more disk trim commands is performed in anautonomous manner; and wherein the one or more disk trim commands aredirected to at least one of the one or more solid-state storage devices.2. The method of claim 1 wherein the operating system is not operablefor generating the one or more disk trim commands.
 3. The method ofclaim 1 wherein the generating one or more disk trim commands furthercomprises merging one or more host trim commands into the one or moredisk trim commands.
 4. The method of claim 1 wherein the receiving hostcommands further comprises: updating a map from a plurality of hostblock addresses to a plurality of array block addresses; and placing oneor more old array block addresses in the one or more disk trim commands.5. The method of claim 1 wherein the managing a storage array isperformed in software.
 6. The method of claim 1 wherein the managing astorage array is performed in software in a hypervisor.
 7. The method ofclaim 1 wherein the managing a storage array further comprises:maintaining one or more maps and one or more freelists; performinggarbage collection on at least one of the one or more maps and one ormore freelists as a result of the receiving of the one or more hostcommands; generating one or more superblocks; and placing one or moresuperblock addresses of the one or more superblocks in the one or moredisk trim commands.
 8. A storage array controller operable to be coupledto a host system and a storage array; wherein the storage array includesa plurality of storage devices; wherein the plurality of storage devicesincludes at least one solid-state storage device; wherein the storagearray controller is operable to receive host commands from the hostsystem; and wherein the storage array controller is operable toautonomously issue a disk trim command to the at least one solid-statestorage device.
 9. The storage array controller of claim 8 wherein thestorage array controller maintains a map and a freelist; wherein the mapconverts host block addresses to array block addresses; and wherein thefreelist includes a plurality of free array block addresses.
 10. Thestorage array controller of claim 9 wherein the storage array controlleris operable to place one or more of the plurality of free array blockaddresses in the disk trim command.
 11. The storage array controller ofclaim 9 wherein the storage array controller issues a disk trim commandto array block addresses that are not in the map.
 12. The storage arraycontroller of claim 9 wherein the storage array controller creates oneor more old array block addresses; and wherein the storage arraycontroller issues disk trim commands to the one or more old array blockaddresses.
 13. The storage array controller of claim 9 wherein thestorage array controller performs garbage collection.
 14. The storagearray controller of claim 9 wherein the storage array controllercollects write commands into one or more superblocks; and wherein thestorage array controller writes to one or more of the at least onesolid-state disks using the one or more superblocks.
 15. The storagearray controller of claim 8 wherein the disk trim command is generatedin a device driver.
 16. The storage array controller of claim 15 whereinthe device driver is part of a host system.
 17. The storage arraycontroller of claim 15 wherein the device driver is part of ahypervisor.
 18. The storage array controller of claim 8 wherein thestorage capacity presented to the host system (C1) is less than thestorage array capacity (C2); wherein the storage array capacity (C2)minus the storage capacity presented to the host system (C1) is aportion of storage capacity (C2−C1); and wherein the storage arraycontroller autonomously issues a trim command to the portion of storagecapacity (C2−C1).
 19. The storage array controller of claim 8 whereinthe storage array controller issues a disk trim command during anoperation selected from the following: storage array initialization,storage array creation, storage array resizing, LUN creation, LUNremoval, LUN resizing, LUN deletion.
 20. A computer system for storingand providing data; the computer system operable to be coupled to astorage array controller; the storage array controller operable to becoupled to a storage array; the storage array including a plurality ofstorage devices; the plurality of storage devices including at least onesolid-state storage devices; and wherein the storage array controller isoperable to autonomously issue a disk trim command to one or more of theat least one solid-state storage devices.